AITopics | line search

Affine-Invariant Global Non-Asymptotic Convergence Analysis of BFGS under Self-Concordance

Neural Information Processing SystemsJun-18-2026, 23:52:18 GMT

In this paper, we establish global non-asymptotic convergence guarantees for the BFGS quasi-Newton method without requiring strong convexity or the Lipschitz continuity of the gradient or Hessian. Instead, we consider the setting where the objective function is strictly convex and strongly self-concordant. For an arbitrary initial point and any arbitrary positive-definite initial Hessian approximation, we prove global linear and superlinear convergence guarantees for BFGS when the step size is determined using a line search scheme satisfying the weak Wolfe conditions. Moreover, all our global guarantees are affine-invariant, with the convergence rates depending solely on the initial error and the strongly self-concordant constant. Our results extend the global non-asymptotic convergence theory of BFGS beyond traditional assumptions and, for the first time, establish affine-invariant convergence guarantees--aligning with the inherent affine invariance of the BFGS method.

artificial intelligence, machine learning, optimization problem, (17 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

high_prob_ls_nonconvex_final

Billy Jin

Neural Information Processing SystemsApr-25-2026, 19:27:27 GMT

algorithm, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.30)

Add feedback

2b323d6eb28422cef49b266557dd31ad-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 06:15:36 GMT

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country: Europe (0.46)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

The High Line: Exact Risk and Learning Rate Curves of Stochastic Adaptive Learning Rate Algorithms

Neural Information Processing SystemsApr-24-2026, 11:08:32 GMT

We develop a framework for analyzing the training and learning rate dynamics on a large class of high-dimensional optimization problems, which we call the high line, trained using one-pass stochastic gradient descent (SGD) with adaptive learning rates. We give exact expressions for the risk and learning rate curves in terms of a deterministic solution to a system of ODEs. We then investigate in detail two adaptive learning rates -- an idealized exact line search and AdaGrad-Norm -- on the least squares problem. When the data covariance matrix has strictly positive eigenvalues, this idealized exact line search strategy can exhibit arbitrarily slower convergence when compared to the optimal fixed learning rate with SGD. Moreover we exactly characterize the limiting learning rate (as time goes to infinity) for line search in the setting where the data covariance has only two distinct eigenvalues. For noiseless targets, we further demonstrate that the AdaGrad-Norm learning rate converges to a deterministic constant inversely proportional to the average eigenvalue of the data covariance matrix, and identify a phase transition when the covariance density of eigenvalues follows a power law distribution.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.59)

Add feedback

ab41313eaa3cbedbe491c24cbfe6547d-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 08:34:10 GMT

artificial intelligence, inequality, machine learning, (19 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

6d0bf1265ea9635fb4f9d56f16d7efb2-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-13-2026, 18:57:56 GMT

Supplementary Materials for "Don't be so Monotone: Relaxing Stochastic Line Search in Over-Parameterized Models" Appendix A The Algorithm Appendix B Convergence Rates Appendix B.1 Rate of Convergence for Strongly Convex Functions Appendix B.2 Rate of Convergence for Convex Functions Appendix B.3 Rate of Convergence for Functions Satisfying the PL Condition Appendix B.4 Common Lemmas Appendix B.5 The Polyak Step Size is Bounded Appendix C Experimental details Appendix D Plots Completing the Figures in the Main Paper Appendix D.1 Comparison between PoNoS and the state-of-the-art Appendix D.2 A New Resetting Technique Appendix D.3 Time Comparison Appendix D.4 Experiments on Convex Losses Appendix D.5 Experiments on Transformers Appendix E Additional Plots Appendix E.1 Study on the Choice of c: Theory (0.5) vs Practice (0.1) Appendix E.2 Study on the Line Search Choice: V arious Nonmonotone Adaptations Appendix E.3 Zoom in on the Amount of Backtracks Appendix E.4 Study on the Choice of η In this section, we give the details of our proposed algorithm PoNoS. Training machine learning models (e.g., neural networks) entails solving the following finite sum problem: min Before that, we establish the following auxiliary result. The following Lemma shows the importance of the interpolation property. Lemma 4. W e assume interpolation and that f Let us now analyze case 2). Let us now show that b < 1. B.2 Rate of Convergence for Convex Functions In this subsection, we prove a O ( The above bound will be now proven also for case 2).

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

6d0bf1265ea9635fb4f9d56f16d7efb2-Paper-Conference.pdf

Neural Information Processing SystemsFeb-13-2026, 18:57:52 GMT

Recent works have shown that line search methods can speed up Stochastic Gradient Descent (SGD) and Adam in modern over-parameterized settings. However, existing line searches may take steps that are smaller than necessary since they require a monotone decrease of the (mini-)batch objective function.

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
North America > Canada > British Columbia (0.04)
Europe > Russia (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

e8f2779682fd11fa2067beffc27a9192-Supplemental.pdf

Neural Information Processing SystemsFeb-10-2026, 22:26:39 GMT

In this analysis, we assume that evaluating the GP prior mean and kernel functions (and the corresponding derivatives) takesO(1)time. For each fantasy model, we need to compute the posterior mean and covariance matrix for the L points (x,w1:L), on which we draw the sample paths. This results in a total cost ofO(KML2)to generate all samples. The SAA approach trades a stochastic optimization problem with a deterministic approximation, which can be efficiently optimized. Suppose that we are interested in the optimization problemminxEω[h(x,ω)].

artificial intelligence, optimization, optimization problem, (18 more...)

Neural Information Processing Systems

Country: